ARTICLE Prediction of Protein Solubility in Escherichia coli Using Logistic Regression

نویسندگان

  • Armando A. Diaz
  • Emanuele Tomba
  • Reese Lennarson
  • Rex Richard
  • Miguel J. Bagajewicz
  • Roger G. Harrison
چکیده

In this article we present a new and more accurate model for the prediction of the solubility of proteins overexpressed in the bacterium Escherichia coli. The model uses the statistical technique of logistic regression. To build this model, 32 parameters that could potentially correlate well with solubility were used. In addition, the protein database was expanded compared to those used previously. We tested several different implementations of logistic regression with varied results. The best implementation, which is the one we report, exhibits excellent overall prediction accuracies: 94% for the model and 87% by crossvalidation. For comparison, we also tested discriminant analysis using the same parameters, and we obtained a less accurate prediction (69% cross-validation accuracy for the stepwise forward plus interactions model). Biotechnol. Bioeng. 2009;9999: 1–10. 2009 Wiley Periodicals, Inc.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Prediction of protein solubility in Escherichia coli using logistic regression.

In this article we present a new and more accurate model for the prediction of the solubility of proteins overexpressed in the bacterium Escherichia coli. The model uses the statistical technique of logistic regression. To build this model, 32 parameters that could potentially correlate well with solubility were used. In addition, the protein database was expanded compared to those used previou...

متن کامل

Prediction of Protein Solubility in Escherichia Coli Using Discriminant Analysis, Logistic Regression, and Artificial Neural Network Models

Recombinant DNA technology is important in the mass production of proteins for academic, medical, and industrial use, and the prediction of the solubility of proteins is a significant part of it. However, the protein solubility when overexpressed in a host organism is difficult to predict. Thus, a model capable of accurately estimating the likelihood of proteins to form insoluble inclusion bodi...

متن کامل

Bioinformatics approaches for improved recombinant protein production in Escherichia coli: protein solubility prediction

The solubility of recombinant protein expressed in Escherichia coli often represents the production yield. However, up-to-date, instances of successful production of soluble recombinant proteins in E. coli expression system with high yield remain scarce. This is mainly due to the difficulties in improving the overall production capacity, as most of the well-established strategies usually involv...

متن کامل

Fuzzy Hybrid least-Squares Regression Approach to Estimating the amount of Extra Cellular Recombinant Protein A from Escherichia coli BL21

Introduction: Immune Protein A is a component with a vast spectrum of biochemical, biological and medical usages. The coding gene of this protein was extracted from Staphylococcus aureus and was cloned and expressed in Escherichia coli bacteria. Suitable statistical methods are utilized to optimize expression conditions  for evaluating experiment accuracy , guarantee the accuracy of subsequent ...

متن کامل

Enhancement of Solubility and Specific Activity of a Cu/Zn Superoxide Dismutase by Co-expression with a Copper Chaperone in Escherichia coli

Background: Human Cu/Zn superoxide dismutase (hSOD1) is an antioxidant enzyme with potential as a therapeutic agent. However, heterologous expression of hSOD1 has remained an issue due to Cu2+ insufficiency at protein active site, leading to low solubility and enzymatic activity.Objectives:The effect of co-expressed human copper chaperone (hCCS) to enhance the solubility and enzymatic act...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009